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Abstract 

Let (Xt,Yt)t£T be a discrete or continuous-time Markov process with state space 
X x M. d where X is an arbitrary measurable set. Its transition semigroup is assumed to 
be additive with respect to the second component, i.e. (X t ,Y t )t^T is assumed to be a 
Markov additive process. In particular, this implies that the first component (Xt)teT is 
also a Markov process. Markov random walks or additive functionals of a Markov process 
are special instances of Markov additive processes. In this paper, the process (Y t )t^T is 
shown to satisfy the following classical limit theorems: 

(a) the central limit theorem, 

(b) the local limit theorem, 

(c) the one-dimensional Berry-Esseen theorem, 

(d) the one-dimensional first-order Edgeworth expansion, 

provided that we have sup tg ( -^nTE^oflYtl"] < oo with the expected order a with re- 
spect to the independent case (up to some e > for (c) and (d)). For the statements 
(b) and (d), a Markov nonlattice condition is also assumed as in the independent case. 
All the results are derived under the assumption that the Markov process (X t )t£T has an 
invariant probability distribution 7r, is stationary and has the L 2 (7r)-spectral gap prop- 
erty (that is, (X t )t£n is p-mixing in the discrete-time case). The case where (X t )teT is 
non-stationary is briefly discussed. As an application, we derive a Berry-Esseen bound 
for the M-estimators associated with p-mixing Markov chains. 

subject classification : 60J05, 60F05, 60J25, 60J55, 37A30, 62M05 

Keywords : Markov additive process, central limit theorems, Berry-Esseen bound, Edge- 
worth expansion, spectral method, p-mixing, M-estimator. 



1 Introduction 

In this paper, we are concerned with the class of Markov Additive Processes (MAP). The 
discrete and continuous-time cases are considered so that the time parameter set T will denote 
N or [0, +oo). Let X be any set equipped by a cr-algebra X and let B(M. d ) be the Borel a- 
algebra on M. d (d > 1). A (time homogeneous) MAP (Xt,Yt)t^j is a (time homogeneous) 
Markov process with state space X x M. d and transition semigroup (Qt)t&T satisfying: Vt G T, 
V(x,y) G X x R d , V(A,B) G X x B(R d ), 

Q t (x,y;Ax B) = Q t (x,0; A x B-y). (1.1) 
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In other words, the transition semigroup is additive in the second component. It follows from 
the definition that the first component (Xt)t£j of a MAP is a (time homogeneous) Markov 
process. The second component (Y t ) t £j must be thought of as a process with independent 
increments given a(X s ,s > 0). We refer to [15] for the general structure of such processes. 
Note that a discrete-time MAP is also called a Markov Random Walk (MRW). In stochastic 
modelling, the first component of a MAP is usually associated with a random environment 
which drives or modulates the additive component (Yt)tej- The MAPs have been found to 
be an important tool in various areas as communication networking (e.g. see [2, 71, 72]), 
finance (e.g. see [1, 3, 56]), reliability (e.g. see [17, 37, 64, 70]), ... Some important instances 
of MAP are: 

• in discrete/continuous-time : (X t ,Y t )tej where (Yt)tei is a IR^-valued additive func- 
tional (AF) of the Markov process (Xt)t£j- Therefore any result on the second compo- 
nent of a MAP applies to an AF. Basic discrete and continuous-time AFs are respec- 
tively 

Y = 0, WgN*, Y t = Y j aX k ); Vt€[0,+oo[, Y t =[\(X s )ds (1.2) 

fc=l Jo 

where £ is a Revalued function satisfying conditions under which Yt is well-defined for 
every t G T. When (Xt)t£j is a regular Markov jump process, then any non-decreasing 
AF has the form (e.g. [16]) 

/ £i(X s )ds + V£ 2 (X S _,X S ) 

J ° s<t 

where Xt- = lim s _>t s< tX s , £i and £2 are non- negative measurable functions such that 
^2(x,x) = for every x G X. General representations and properties of AFs may 
be found in [5, 77, and references therein]. Such AFs are basically introduced when 
some kind of "rewards" are collected along with the dynamics of the Markov process 
(Xt)t£j through the state space X. Thus, Yt is the accumulated reward on the finite 
interval [0,i\. Even if the state space X is a finite set, the numerical computation of 
the probability distribution of such AFs is not an easy task (e.g. see [9, 82]). 

• in discrete-time: the Markov renewal processes when the random variables Yt, t G N, 
are non-negative; if we consider a hidden Markov chain (Xt, Zt)t£~N, where the so-called 
observed process (^t)teN is M d -valued (Zq = 0), then (X t , Ylk=l ^k)teN is a MAP. 

• in continuous time: the Markovian Arrival Process where (Xt)t£j is a regular jump 
process and (Yj)j g T is a point process (see [2]), which includes the so-called Markov 
Modulated Poisson Process. 

Seminal works on MAPs are [21, 22, 59, 69, 75] and are essentially concerned with a finite 
Markov process (Xt)teT as first component. The second component (Yt)tei was sometimes 
called a process defined on a Markov process. When X is a finite set, the structure of MAPs 
are well understood and an account of what is known can be found in [2, Chap XI]. In 
this paper, we are concerned with Gaussian approximations of the distribution of the second 
component It of a MAP. Central limit theorems for (Yt)tej may be found in [7, 27, 30, 50, 51, 
59, 61, 75, 83, 84] under various assumptions. Here, such results are derived when (X t )t<=T has 
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an invariant probability measure n, is stationary and has the L 2 (7r)-spectral gap property (see 
conditions (AS1-AS2) below). Moreover, standard refinements of the central limit theorem 
(CLT) related to the convergence rate are provided. Before, notations and assumptions used 
throughout the paper are introduced. 

Let (Xt, Yt)t£j be a MAP with state space Xxl^ and transition semigroup (Qt)teT- (X, X) 
is assumed to be a measurable space equipped with a cr-algebra X . In the continuous-time 
case, {Xt,Yt)t£T is assumed to be progressively measurable. (Xt)tej is also a Markov process 
with transition semigroup (Pt)te.j given by 

P t (x,A) := Q t (x,0;A xM d ). 

Throughout the paper, we assume that (Xt)t£j has a unique invariant probability measure 
denoted by it (Vt G T, iroP t = n). We denote by L 2 (7r) the usual Lebesgue space of (classes of) 
functions / : X — > C such that ||/||2 := \/ 7r(|/| 2 ) = (J* x \f\ 2 d'n) l l 2 < oo. The operator norm 
of a bounded linear operator T on L, 2 (tt) is defined by ||T||2 := sup|j 6L 2( 7r ).j|jj| 2=1 } ||T(/)||2. 
We appeal to the following conditions. 
AS1. (X t ) te j is stationary (i.e. Xq ~ ir). 

AS2. The semigroup (Pt)t£j 0/ (-Xt)teT has a spectral gap on~L 2 (ir): 

lim ||P t -n|| 2 = 0, (1.3) 

t— >+oo 

where II denotes the rank-one projection defined on L, 2 (ir) by: 11/ = 7r(/)lx- 
AS 3. The process (It)teT satisfies the moment condition 

sup E wfi [\Y t \ a ] < oo (1.4) 
te(o,i]nT 

where |-| denotes the euclidean norm on~R d andlK^fl is the expectation when (Xq,Yq) ~ (tt, Sq). 
In the discrete-time case, notice that the moment condition (1.4) reduces to (AS3d) 

E^opin <oo (AS3d) 

and that condition (AS2) is equivalent to the p-mixing property of (Xt)teN, with p-mixing 
coefficients going to exponentially fast [81]. Condition (AS2) is also related to the notion 
of essential spectral radius (e.g. see [86]). 

Under (AS1-AS2), we show that the second component {Yt)t^r of the MAP satisfies, in 
discrete and continuous time, the following standard limit theorems : 

(a) the central limit theorem, under (ASS) with the optimal value a = 2; 

(b) the local limit theorem, under (ASS) with the optimal value a = 2 and the additional 
classical Markov non-lattice condition; 

(c) the one- dimensional Berry-Esseen theorem, under (ASS) with the (almost) optimal value 
(a > 3); 

(d) a one- dimensional first- order Edgeworth expansion, under (ASS) with the (almost) opti- 
mal value (a > 3) and the Markov non-lattice condition. 

These results correspond to the classical statements for the sequences of independent and 
identically distributed (i.i.d.) random variables, with the same order a (up to e > in (c) 
and (d)). Such results are known for special MAPs satisfying (AS2) (comparison with earlier 
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works is made after each statement), but to the best of our knowledge, the results (a)-(d) 
are new for general MAPs satisfying (AS2), as, for instance, for AF involving unbounded 
functionals. 

Here, the main arguments are 

• for the statement (a): the p-mixing property of the increments (Xt+i — ^t)teT of the 
process (Yt)t£j (see Proposition 3.1). This result, which has its own interest, is new to 
the best of our knowledge. The closest work to this part is a result of [38] which, by 
using ^-mixing properties, gives the CLT for MAPs associated with uniformly ergodic 
driving Markov chains (i.e. {Pt)tej has a spectral gap on the usual Lebesgue space 
L°°(7r)). Condition (AS2) is less restrictive than uniform ergodicity (which is linked to 
the so-called Doeblin condition). 

• For the refinements (b-d) : the Nagaev-Guivarc'h spectral method. The closest works to 
this part are, in discrete-time the paper [49] in which these refinements are obtained for 
the AF: Y t = Ylk=i £(-^fc)> an d in continuous-time the work of Lezaud [62] which proves, 
under the uniform ergodicity assumption, a Berry-Esseen bound for the integral additive 
functional (1.2). Here, in discrete-time, we borrow to a large extent the weak spectral 
method of [49]: this is outlined in Proposition 4.2, which gives a precise expansion 
(close to the i.i.d. case) of the characteristic function of Yt. For continuous-time MAPs, 
similar expansions can be derived from the semigroup property of the Fourier operators 
of the MAP. Proposition 4.2, and its continuous-time counterpart Proposition 4.4, are 
the key results to establish limit theorems (as for instance the statements (b-d)) with 
the help of Fourier techniques. 

The classical (discrete and continuous-time) models for which the spectral gap prop- 
erty (AS2) is met, are briefly reviewed in Subsections 2.2-2.4. The above limit theorems 
(a)-(d) are valid in all these examples and open up possibilities for new applications. First, 
our moment conditions are optimal (or almost optimal). For instance, in continuous time, 
the Berry-Esseen bound in [62] requires that £ in the integral (1.2) is bounded, while our 
statement (c) holds true under the condition vr(|^| 3+e ) < oo. Second, our results are true for 
general MAPs. For instance, they apply to Y t = Yl\=i £(-^fc-i> Xk)- This fact enables us 
to prove a Berry-Esseen bound for M-estimators associated with p-mixing Markov chains, 
under a moment condition which greatly improves the results in [76]. 

The paper is organised as follows. The L 2 (-7r)-spectral gap assumption for a Markov process 
is briefly discussed in Section 2 and connections to standard ergodic properties are pointed 
out. In Section 3, the CLT for (Y t ) teT under (AS1)-(AS3) with a = 2 is derived. The func- 
tional central limit theorem (FCLT) is also discussed. Section 4 is devoted to refinements of 
the CLT. First, the Fourier operator is introduced in Subsection 4.1, the characteristic func- 
tion of Yt is investigated in Subsection 4.2, and our limit theorems are proved for discrete-time 
MAPs in Subsection 4.3. Their extension to the non-stationary case is discussed in Subsec- 
tion 4.4. The continuous-time case is studied in Subsection 4.5. The statistical application 
to M-estimators for p-mixing Markov chains is developed in Section 5. 

Finally, we point out that the natural way to consider the Nagaev-Guivarc'h method in 
continuous-time is the semigroup property of the Fourier operators of the MAP (see Subsec- 
tion 4.1 for details). To the best of our knowledge, this property, which is closely related 
to the additivity condition (1.1) defining a MAP, has been introduced and only exploited in 
[50]. 
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2 The L 2 (7r)-spectral gap property (AS2) 



2.1 Basic facts on property (AS2) 

We discuss the condition (AS2) for the semigroup (Pt)teT of (X t )teT- It is well-known that 
(Pt)teT is a contraction semigroup on each Lebesgue-space L p (7r) for 1 < p < +oo, that is: we 
have H-Pt ||p < 1 for all t G T, where denotes the operator norm on L p (7r). Condition (AS2), 
introduced by Rosenblatt [81] and also called strong ergodicity on L 2 (-7r), implies that (Pt)teT 
is strongly ergodic on each L p (7r) (1 < p < +oo), that is \\Pt — Tl\\ p —> when t —> +oo. 
Moreover, (AS2) is fulfilled under the so-called uniform ergodicity property, i.e. the strong 
ergodicity on L°°(7r). These properties, established in [81], can be easily derived from the 
Riesz-Thorin interpolation theorem [6] which insures, thanks to the contraction property of 
P t , that 

\\Pt - n|| p < \\p t - n|pp - n||^ Q < 2 mi n {||p - n||£, \\p t - n||^- a }, (2.1) 

where pi,P2 G [1, +00] and p G [1, +00] satisfy 1/p = a/pi + (1 — a)/p2 for some a G [0, 1]. 
Indeed, assume that Condition (AS2) holds. Then Inequality (2.1) with (p\ = 2,p2 = +00) 
and a G (0, 1) gives the strong ergodicity on U'^) for each p G (2, +00). Notice that the 
value p = +00 is obtained with a = 0, but in this case, the uniform ergodicity cannot be 
deduced from (AS2) and (2.1). In fact the uniform ergodicity condition is stronger than 
(AS2) (see [81]). Next Inequality (2.1) with (p\ = 2,p2 = 1) and a G (0, 1) gives the strong 
ergodicity on L p (-7r) for each p G (1,2). The value p = 1 is obtained with a = 0, but the 
strong ergodicity on L 1 (7r) cannot be deduced from (AS2) and (2.1). Finally, if the uniform 
ergodicity is assumed, then Inequality (2.1) with (pi = +00, P2 = 1) and a = 1/2 yields 
(AS2). 

Also notice that the strong ergodicity property on L p (7r) holds if and only if there exists 
some strictly positive constants C and e such that we have for all i £ T: 

\\Pt-U\\ p <Ce- £t . (2.2) 

Indeed, if kq := ||P T — n|| p < 1 for some r G T (which holds under the strong ergodicity 
property), then we have for all n G N*: ||P nr -n|| p = ||P™-II||p = ||(P T -II) n ||p < Kq. Writing 
t = w + nr with n G N* and w G [0,r), we obtain: ||P t -n|| p = ||P 10 (P"-n)|| p < k% < C e~ et 
with C := 1/kq and e := (— 1/r) ln^o- The converse implication is obvious. Thus, the strong 
ergodicity property on L, 2 (tt), i.e condition (AS2), is equivalent to require L 2 (^-exponential 
ergodicity (2.2), that is the L 2 (7r)-spectral gap property. 

In the next subsection, Markov models with a spectral gap on L 2 (7r) arising from stochastic 
modelling and potentially relevant to our framework are introduced. Assumption (AS2) can 
be also met in more abstract settings, as for instance in [41] where the L 2 -spectral gap 
property for classic Markov operators (with a state space defined as the d-dimensional torus) 
is proved. 

2.2 Geometric ergodicity and property (AS2) 

Recall that (Xt)t£N is ^-geometrically ergodic if its transition kernel P has an invariant 
probability measure tt and is such that there are r G (0, 1), a finite constant K and a 7r-a.e 
finite function : X 1 — >■ [1, +00] such that 

Vn > 0, vr-a.e. x G X, sup (|P n /(x) - tt(/)|, / : X C, |/| < V) < KV{x) r n . (VG) 
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In fact, when (X t ) te ^ is ^-irreducible (i.e. ip(A) > ==> P{x,A) > 0,Vx G X) and aperiodic 
[67], condition (VG) is equivalent to the standard geometric ergodicity property [78]: there 
are functions r : X — > (0, 1) and C : X i— > [1, +00) such that: for all n 6 N, tt — a.e. x 6 X, 

II^Oe, •) - ttOHtv := sup {\P n f(x) - tt(/)|, / : X C, |/| < l} < C(x) r(a;) n . 

There is another equivalent operational condition to geometric ergodicity for ^-irreducible 
and aperiodic Markov chains (Xt)ten, the so-called " drift-criterion": there exist a function 
V : X — > [1, +00], a small set C C X and constants 5 > 0, b < 00 such that 

PV < (1- <5)1/ + 61 c . 

We refer to [67] for details and applications, and to [57] for a recent survey on the CLT for the 
additive functionals of (Xt)t£N in (1.2). Now, the transition kernel P is said to be reversible 
with respect to tt if 

Tr(dx)P(x,dy) = Tr(dy)P(y,dx) 

or equivalently if P is self-adjoint on the space L 2 (7r). It is well known that a ^/-geometrically 
ergodic Markov chain with a reversible transition kernel has the L 2 (7r)-spectral gap property 
[78]. Moreover, for a ^-irreducible and aperiodic Markov chain {X t ) t ^ with reversible tran- 
sition kernel, (l^-)geometric ergodicity is shown to be equivalent to the existence of a spectral 
gap in L 2 (7r), and, when Xq ~ /i, we also have [78, Th 2.1], [80] 

||^P"(-)-7r(-)|| TV <^-vr| i2w r". (R) 

where r := lim n _j. +00 (||P n — XT 1 1 2) i/ 7 ^- anc j |^ _ ^ := \\dfi/dir — 1 1 1 2 if well-defined and 00 
otherwise. Note that the reversibility condition is central to the previous discussion on the 
L 2 (7r)-spectral gap property. Indeed, there exists a ^-irreducible and aperiodic Markov chain 
which is geometrically ergodic but does not admit a spectral gap on L 2 (7r) [43]. 

Such a context of geometric ergodicity and reversible kernel is relevant to the Markov 
Chain Monte Carlo methodology for sampling a given probability distribution, i.e. the target 
distribution. Indeed, the basic idea is to define a Markov chain (Xt)t^ with the target 
distribution as invariant probability measure tt. Then a MCMC algorithm is a scheme to draw 
samples from the stationary Markov chain (Xt)tem- But, the initial condition of the algorithm, 
i.e. the probability distribution of Xo, is not ir since the target distribution is inaccessible. 
Therefore the convergence in distribution of the Markov chain to tt in regard of the probability 
distribution of Xq must be guaranteed and the knowledge of the convergence rate is crucial 
to monitor the sampling. Thus, central limit theorem for the Markov chains and quantitative 
bounds as in (R) are highly expected. Geometric ergodicity of Hasting-Metropolis type 
algorithms has been investigated by many researchers. Two standard instances are the full 
dimensional and random-scan symmetric random walk Metropolis algorithm [25, 55, and 
references therein]. Note that the first algorithm is also referred to as a special instance of 
the Hasting algorithm and the second one to as a Metropolis-within-Gibbs sampler. Let tt be 
a probability distribution on M. d which is assumed to have a positive and continuous density 
with respect to the Lebesgue measure. The so-called proposal densities are assumed to be 
bounded away from in some region around zero (the moves through the state space X are 
based on these probability distributions). These conditions assert that the corresponding 
transition kernel for each algorithm is ^-irreducible, aperiodic and is reversible with respect 
to tt. Geometric ergodicity for the Markov chain (Xt)t£N (and so the existence of a spectral 
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gap in L 2 (7r)) is closely related to the tails of the target distribution n. For instance, in the 
first algorithm, it can be shown that tt must have an exponential moment [55, Cor 3.4]. A 
sufficient condition for geometric ergodicity in case of super-exponential target densities, is 
of the form [55, Th 4.1] 

lira (— , — ) , < 0. 

|x-|^+oo \x\ \\/ir{x)\ 

For the second algorithm, sufficient conditions for geometric ergodicity are reported in [25] 
when the target density decreases either subexponentially or exponentially in the tails. A 
very large set of examples and their respective merit are discussed in these two references. 
We refer to [79, and references therein] for a recent survey on the theory of Markov chains in 
connection with MCMC algorithms. 



2.3 Uniform ergodicity and hidden Markov chains 

As quoted in the introduction, a discrete-time MAP is closely related to a hidden Markov 
chain. Standard issues for hidden Markov chains require to be aware of the convergence rate 
of the hidden Markov state process (Xt)t£N- One of them is the state estimation via filtering 
or smoothing. In such a context, minorization conditions on P are usually involved. The 
basic one is: there exists a bounded positive measure ip on X such that for some m G N*: 

Vx GX,Vie X, P m (x, A) > <p(A). (UE) 

It is well-known that this is equivalent to the uniform ergodicity property or to condition 
(VG) with V(x) = 1 [67, Th 16.2.1, 16.2.2]. Recall that uniform ergodicity gives the L 2 (vr)- 
spectral gap property (AS2), but the converse is not true. Another minorization condition is 
the so-called "Doeblin condition": there exists a probability measure ip such that for some 
m, e < 1 and 5 > [20] 

(p{A) > e =>■ Vx G X, P m {x, A) > S. (D ) 

It is well known that, for ergodic and aperiodic Markov chains, (-Do) is equivalent to the 
uniform ergodicity. We refer to [14, and the references therein] for an excellent overview of 
the interplay between the Markov chain theory and the hidden Markov models. 



2.4 Property (AS2) for continuous time Markov processes 

The Markov jump processes are a basic class of continuous-time Markov models which has 
a wide interest in stochastic modelling. The L 2 (7r)-exponential convergence has received 
attention a long time ago. We refer to [18] for a good account of what is known on ergodic 
properties for such processes. In particular, the L 2 (7r)-spectral gap property is shown to be 
equivalent to the standard exponential ergodicity for the birth-death processes: 

3/3 > such that V(», j) G X 2 ,3d > 0, \P t (i,j) ~^j\< C;exp(-/3i) t -> +oo 

where (Pt{hj))ijeX is the matrix semigroup of (X t )t>o- This is also true for the reversible 
Markov jump processes. Hence, in these cases, criteria for exponential ergodicity are also valid 
to check the L 2 (7r)-exponential convergence. Moreover, explicit bounds on the spectral gap are 
discussed in details in [18]. For the birth-death processes, we also refer to [58, and references 
therein]) where explicit formulas are obtained for classical Markov queuing processes. The 
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birth-death processes are often used as reference processes for analyzing general stochastic 
models. This idea was in force in the Liggetts's derivation of the L 2 -exponential convergence 
of supercritical nearest particle systems [63]. The interacting systems of particles are also a 
source of examples of processes with a L 2 -spectral gap. We refer to [63] for such a discussion on 
various classes of stochastic Ising models. In physics and specially in statistical physics, many 
evolution models are given by stochastic ordinary /partial equations. When the solutions are 
finite/infinite dimensional Markov processes, standard issues arise: existence and uniqueness 
of an invariant probability measure, ergodic properties which include the rate of convergence 
to the invariant measure with respect to some norm. Such issues may be included in the 
general topic of the stability of solutions of stochastic differential equations (SDEs). Thus, 
it is not surprising that ergodic concepts as the ^/-geometric ergodicity and Lyapunov-type 
criteria associated with, originally developed by Meyn and Tweedie [67] for studying the 
stability of discrete-time Markov models, have been found to be of value (e.g. see [32, and 
references therein]). Here, we are only concerned with the L 2 (-7r)-exponential convergence so 
that we only mention some results related with. 

An instance of L 2 (7r)-spectral gap can be found in [28] where the following SDE is consid- 
ered 

dX t = -- b(X t ) dt + dW t X = xe R d 

where (Wt)t>o is the standard ci-dimensional Brownian motion and b(-) is a gradient field 
from W 1 to M. d (with suitable properties ensuring essentially the existence of a unique strong 
solution to the equation, which has a unique invariant probability measure). When b(-) is 
a radial function satisfying b(x) ~ C|x| a for a > 1 when x —> +oo, then the semigroup is 
shown to be ultracontractive and to have a L 2 (7r)-spectral gap [28]. 

Another instance of L 2 (7r)-spectral gap is related to the K- valued Markov process solution 

to 

dX t = b(X t ) dt + a(X t ) dW t (2.3) 

where (Wt)t>o is the standard 1-dimensional Brownian motion and Xq is a random variable 
independent of (Wt)t>o- Standard assumptions ensure that the solution of the SDE above is a 
positive recurrent diffusion on some interval and a (strictly) stationary ergodic time-reversible 
process. Under additional conditions on the scale and the speed densities of the diffusion 
(Xt)t>o [29, (A4) and reinforced (A5), Prop. 2.8], the transition semigroup of (Xt)t>o is 
shown to have the L 2 (7r)-spectral gap property (explicit bounds on the spectral gap are also 
provided). The basic example studied in [29] is when a(x) := cx u and b{x) := a{f3 — x) with 
v G [1/2, 1], a, G R. Conditions ensuring the L 2 (7r)- spectral gap property are provided in 
terms of these parameters. Applications to some classical models in finance are discussed. 
Note that statistical issues for continuous-time Markov processes as the jump or diffusion 
processes, are related to the time discretization or sampling schemes of these processes. This 
often provides discrete-time Markov chains which inherit ergodic properties of the original 
continuous-time process. Thus we turn to the discussion on the discrete-time case (e.g. 
see [19] for the jump processes, [29] and the references therein for the (hidden) diffusions). 
Finally, the context of the stochastic differential equation (2.3) can be generalized to Markov 
.ff-valued processes solution to infinite dimensional SDEs, where H is a Hilbert space. A 
good account of these generalizations can be found in [33, and references therein]. 
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3 The p-mixing property and central limit theorems 

Let (Xt,Yt)t£j be a MAP taking values in X x M. d . E^q), E^o are the expectation with 
respect to the initial conditions (X ,Y ) ~ (5 X , 5q) and (X ,Y ) ~ (vr,<5o) respectively. First, 
basic facts for MAPs are proposed. Second, they are used to show that, for a discrete-time 
MAP, the increment process (Y n — Y n _i) n£ ^* is exponentially p-mixing under (AS1-AS2). 
Then, a CLT is obtained under conditions (AS1-AS2) and the expected moment condition 
(AS3) (i.e. (AS3d)) with a = 2. 

3.1 Basic facts on MAPs 

Let ¥[ X ' Y) := a(X u ,Y u , u < t), Ff := a(X u , u < t) and ¥ Y := a(Y u , u < t) be the filtration 
generated by the processes (X t , Y t ) te j, (X t )ter and (Y t ) t £T respectively. 

The additivity property (1.1) for the semigroup (Qt )tei reads as follows for any measurable 
(C-valued) function gonlx M rf and any a £ M. d : 

QMa = QtiSa) (3.1) 
where g a {x, y) := g(x, y+a) for every (x, y) € X x W 1 . Let us introduce the following notation: 

Q s (x;dxi x dyi) := Q s (x,0;dxi x dyi). 

Then, we have: 

Lemma 3.1. For any C-valued function g on X x M. d such that ~E[\g(X u , Y u )\] < oo for every 
u £T, we have: 

E[g(X s+t ,Y s+t ) | Ti x ^} = Q t (g Ys )(X s ,0) = Q t (g Ys )(X s ). (3.2) 

or in terms of the increments of the process {Yt)teT : 

E[g(X s+t ,Y s+t - Y s ) | Fi x > Y ^] = Q t (g)(X s ,0) = Q t (g)(X s ) = E {Xafi) \g(X t ,Y t )]. (3.3) 

Proof. The two formula are derived as follows: 

E[g(X s+t ,Y s+t ) | J* X ' Y )] = E[g(X s+t ,Y s+t ) \ X S ,Y S ] (Markov property) 

= Q t (g)(X s ,Y s ) 

= Q t (g Ya )(X s ,0) (from (3.1)) 
= Q t (g Ys )(X s ); 

E[g(X s+t ,Y s+t - Y a ) | = E[g(X s+t , Y s+t - Y s ) \ X S ,Y S ] (Markov property) 

= E[g- Ys (X s+t ,Y s+t ) | X S ,Y S ] 
= Qt(go)(X s ,0) = Q t (g)(X s ) (from (3.2)) 
= E {Xs>0) [g(X t ,Y t )]. 

□ 

Lemma 3.2. For every n > 1, any C-valued function g such that for every < u\ < ■ ■ ■ < u n 
E [b(^«i j Y Ul , X U2 , Y U2 — Y Ul , . . . , X Un , Y Un — ) |] < oo 
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we have for any s > and ti, . . . ,t n > 0: 

E[g(X s+tl ,Y s+tl -Y s ,.. .,X s+ ^ iti ,Y s+ ^ =iti - Y 8+ ^ u ) \ jW] 

/n 
Q s {X s ;dxi x dzi)Y\_Qs(xi-i]dxi x dzi)g(xi, zi, . . . ,x n , z n ) 



i=2 



(®Q u )(g){X s ). 

i=l 



(3.4) 



Proof. Lemma 3.1 gives the case n = 1. Let us check that Formula (3.4) is valid for n = 2. 
This can help the reader to follow the induction. 



E 
E 



^[diXs+hiYs+h - Y s ,X s+tl+t2 ,Y s+tl+ t 2 —Y 8+tl ) | i r i X,y - ) ] 

^[sC^s+tD^s+ti - Y s , X s+tl+t2 ,Y s+tl+ t 2 - Y s+tl ) | .F s +^ 

E[ y ff(X s+tl ,y s+tl - Y s ,x 2 ,y 2 - Y s+tl )Q t2 (X s+tl ,Y s+tl ;dx 2 x dy 2 
E[ y 5 (X s+tl) y s+tl -y s ,x 2 ,z 2 )Q 42 (X s+tl ;dx 2 x dz 2 ) | ^i X ' y )] (using (1.1)) 
Q tl (X s ;dxi x dzi) / Qt 2 (xr,dx 2 x dz 2 )g(xx, z\, x 2 , z 2 ) (using (3.3)) 



= (Qtx®Qt 2 )( 5 )(X s ). 

Let us now complete the induction. Assume that Property (3.4) is valid for n — 1. Then 

E[g(X s+tl ,Y s+tl -Y s ,.. . ,X a+E - =itl ,y a+E - =itl - n+EEi 1 *) I 



E 
E 



E [g(X a+fl , y s+tl - Y s , . . . , X s+ x;« =1 ti , Y s+ j2" =1 u ~ Y s +Y?=i U ) ' T 
(®QtM9(X s+tl ,Y s+tl -Y s ,;--- ,.))(X s+tl )\T s x ' Y ^ 



s+ti 



(induction) 



{Q h ® (.® Qt,))G7)(*,) (using (3.3)). 



□ 



Corollary 3.1. Under (AS1), the following properties hold. 

1. The process (Ytjtei has stationary increments, i.e. 

E nfi [g(Y s+tl -Y 8 ,.. -,Y S+J:uu - y s+Er -i ti )] = K((®?Q ti )(g)) 
does not depend on s for any function g as in Lemma 3.2. 

2. If ^>nfl[\Y u \] < oo for every u E T, then: 

V(s,t) e T 2 , E^.ofn+t] = E w , [y] +E n> o[Y s }. 



(3.5) 
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3. (£ n :=Y n — Y n _i) n £^* is a stationary sequence of M. d -valued random variables and if h is 
a C-valued function such that E 7ri o[|/i(^i, . . . ,£n)| 2 ] = 1> then Qf n (h) € L 2 (7r) with 



\\Q? n (h)h < 1, 



where Qf n denotes the n-fold kernel product ® Q\. 

i=i 



(3.6) 



Proof Take the expectation of (3.4) with respect to the probability mesure tt: 



Qu)(j9)(x.) 

( ® q u )(9)(x ) 



1=1 

n 



E 



ttP s ,0 



i=l 



(invariance property of tt). 



The second property is deduced from the stationarity of the increments of (It)tgT- Indeed, 
we can write E^oK] = IE T ,oK+t - Y a ] = E nfi [Y s+t ] - E nfi [Y s ]. That (£„,)„ e N* is a stationary 
sequence of random variables follows from (3.5) with s = 0,t\ = ■ ■ • = t n = 1. The last 
property follows from (3.4) and the Jensen inequality 



\\QTHl = ||e ( ., ) [Kb, . . . ,&)] ||| < K,o[\H^ ■ • • ^n)l 2 ] = L 



□ 



Lemma 3.3. Let £ n :=Y n — Y n _\ for n £ N* (recall that Yq = a.sj. Let f and h be two 
valued functions such that [|/(£i, . . . , £„)| 2 ] < oo and E nfi [\h(£ n+t , ■■■ , £n+i+m 
for (t,n,m) E (N*) 3 . Under (ASl), the covariance has the following form 



< oo 



CoV w , (/(6, ■ ■ ■ ,fn); • • • , Wm)) = [/(&, • • • ,£n)(iVl " H) (Qf m+1 (/l)) (X n 

(3.7) 

where Qf m+ denotes the (m + l)-fold kernel product (g) Qi. 



«=i 



Proof. Apply formula (3.4) with E^. ) to the specific function g(xi, z\, . . . , x n+ t+ m , z n+t +m) 

f(z 1 ,...,Z n ) 

x h(z n+t , ... , z n+ t+ m ) with t,n,m> 1: 

E(x,0) [7(6, • • • , £n)M&+t, ■ ■ ■ , £n+t+m)] = Qf ^^(sX*) 

n+t+m 

Qi(x;etei x dzi) Qi(xi_i;dx, x dzi)g{x\, . . . , z n+t+m ) 



i=2 



„ n 

\ Qi(x; dxi x dz\) TT Qi(xi_i; dxi x dzi)f(zi, ... ,z n ) 

J(KxM. d ) n i=2 
„ n+t— 1 

x / Qi(xi_i;dxj x tfej) 

„ n+t+m 

x / Qi(xi-i;dxi x dzi)h(z n+t , z n+t+m )- 

J (J ' 



xK d)m + l 



i=n+t 
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The second term reduces to f xt ^ H^+l Qi{ x i-i\ dx i X K d ) = J x t-i 117=1+1 Pi(xi-i,dxi) = 
f x P t -i(x n ;dx n+t _i). The third is Qf m+1 (h)(x n+t ^i). Then we have 

E (z,0) • • • i £n)M&H-t> ■■■ , in+t+m)] 

= I Qi(x;dx! x dz^TlQiixi-^dxiX dz i )f(z 1 ,...,z n )P t -i(Qf m+1 (h)){x n ) 

J(XxR d ) n 

= E ( , )0) [/(£i, • • • ,in)Pt-i{Qf m+l h){X n )\ (using (3.4) with E (a . 0) ). 
Then, integrating against the probability measure 7r(-) gives 

E 7rj0 [/(ei,...,en)/i(en+t,---,en+t+m)] = • • • , Cn)P t -l (Qf m+1 h) (X n )] . (3.8) 

Since U(Qf m+1 {h)){x) = vr(Qf m+1 (h)) for every x G X, we obtain 



E 



7T.0 



/(ei,...,un(Qf m+1 (/ i ))(x n ; 



E., [/(6,...,en)]vr(gf m+1 (/ l )) 

E7r,o[/(6> • • • ,Cn)]E 7r ,o[/i(Cn+i, • • • , Zn+t+m)] 



where the last equality follows from (3.5). 



□ 



Remark 3.1. We can prove a continuous-time counterpart of Lemma 3.3. But, we restrict 
ourself to the discrete-time version because this is the version we need in the paper. 



3.2 p-mixing property of (Y n — l^-i) n >i for discrete-time stationary MAPs 

Let us recall some basic facts on the p-mixing of a (strictly) stationary sequence of random 
variables (£ n )neN* (e.g. see [11]). For any p G N* and q G N* U {oo} with p < q, Q q v := 
°"(£p> ■ ■ ■ i £<j) denotes the cr-algebra generated by random variables £ p , . . . , £ q . The p-mixing 
coefficient at horizon t > 0, denoted by p(t), is defined by 

p(t) := sup sup{|Corr(/;/i)| / G L 2 (£?™), h G L 2 (S- t )} . (3.9) 

nGN* 

where Corr(/; /i) is the correlation coefficient of the two random variables / and g. In fact, 
p-mixing coefficient may be computed as follows from [11, Prop 3.18]: for t > 

p(t) = sup sup sup{|Corr(/;/i)| / G L 2 (^), /i G L 2 (^+* +m )} . (3.10) 

n6N* meN* 

The stationary sequence (£ n )n.eN* is said to be p-mixing if 

lim p(t)=0. 

t— >+oo 

We know from condition (AS2) that (X n ) ng N is p-mixing [81]. In the special case when 
Y n := YHe=i €(Xk), it is clear that (Y n — Y n _i) n >i is also p-mixing from [11, p. 28]. We 
extend this fact to general (Y n ) n£ f^ in the next proposition. 

Proposition 3.1 (p-mixing). Under (AS1-AS2), the stationary sequence (£ n := Y n — Y n -i) n ^* 
is p-mixing at an exponential rate: there exists e > such that 

p(t) = 0{exp(-et)). 
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Proof. For the sake of simplicity, assume that d = 1. First, note that the random variables 
/ and h in (3.10) can be assumed to be of L 2 -norm 1. Thus, we just have to deal with 
covariances. Recall that (£ n := Y n — Y n -i) n ^* is known to be stationary under (AS1) from 
Corollary 3.1. The cr-algebra G± and Gn+t +m i n the mixing coefficients will be relative to 
the stationary sequence (£n)neN*- Second, let us consider two L 2 -normed random variables 
/(&, ...,£„)€ L 2 (S?), h(U + t, ■ ■ ■itn+t+m) e_L 2 (S™+* +m ). For any m > 1, the map x H- 
Qf m+1 (/i)(x) belongs to L 2 (^) and we have ||Qf m+l {h)\\ 2 < 1 from Corollary 3.1 (see (3.6)). 
Since P t and II are contractions on L 2 (vr), we have (P f _i - II) (Qf m+1 (h)) G L 2 (vr). 
The Cauchy-Schwarz inequality and the last comments allow us to write from (3.7) 



Cov(/;/i) 2 < !^,o[|/(£i,...,£n)| 2 ]E*,o | (P t -i - U) {Qf m+1 (h)) [X n ] 

= E*,0 



2 



\{P t -i - n)(Qf m+1 (/i))(X )| 2 ] (vr is P n -invariant) 
= \\(P t . 1 -U){Qf m+ \h))\\l 

< ||P t _i-n|||||Qf m+1 (/ t )||2 (since Qf m+1 (/i) G L 2 (vr)) 

< ||P_i-n|| 2 (since \\Qf m+1 (h)\\ 2 <l). 

Therefore, it follows that for every t > 1: 

sup{|Corr(/;/i)| / G L 2 (^), G L 2 (^+* +m )} < ||P*-i - II|| 2 . 

The right hand side term in the inequality above does not depend on m and n, so that we 
obtain from (3.10) 

P (t) < ||p t _! -n|| 2 . 

The proof is completed by using the exponential estimate (2.2) of \\Pt-i — IT 1 1 2 under (AS2). 

□ 



3.3 Central limit theorem for the Markov additive processes 

In a first step, we consider a discrete-time X x Revalued MAP, (X n ,Y n ) n ^, for which 
the driving Markov chain {X n ) n£ ^ is assumed to satisfy (AS1-AS2). Recall that Condition 
(AS3d) for a = 2 is 

E^oflYxl 2 ] < 00. 

This condition implies that E T) o[|^i|] < 00, and we suppose that E^ o[Xi] = for convenience 
(if not, replace Y n by Y n — E, n fi[Y n ] = Y n — raE 7r) o[l'i] from Corollary 3.1). 

We know from Proposition 3.1 that (Yn+i — Y n ) n ^ is stationary and is exponentially p- 
mixing when (AS1-AS2) hold. Then, under the expected moment condition E^o^Yi] 2 ] < 00, 
the CLT for (y„) n6N * follows from [52, 73] (e.g. see [11, Th 11.4]). To the best of our 
knowledge, Theorem 3.1 for general MAP is new. The notation jV(0, 0) stands for the Dirac 
distribution at 0. 

Theorem 3.1. Under (AS1-AS2) and (AS3d) for a = 2, (Y n /y/n) n gN converges in distribu- 
tion when n — > +00 to the d- dimensional Gaussian law AA(0,E), where S is the asymptotic 
covariance d x d-matrix 

S := lim-E^ [Y n Y*] 
n n 

where the symbol * denotes the transpose operator. 



13 



That (Y n / V™)neH satisfies the CLT under the condition E^ oll^il 2 ] < 00 was known in 
some cases. Such standard situations are recalled in the two next remarks (with d = 1 to 
simplify) . 

Remark 3.2 (Martingale method). If there exists a measurable function £ : X — > R such that 
Y n — Y n -\ = £.{X n ) and E^o^Yi] 2 ] = vr(^ 2 ) < 00, then (Y n / V^)neN converges in distribution 
to the Gaussian law M(0,a 2 ) where a 2 = vr(£ 2 ) + 2 7r(fP*f) £ [0,+oo). This result 
follows from the Gordin-Lifsic theorem [34]. Indeed, (AS2) implies that (X n ) n ^ is ergodic 
and that there is a solution £ £ L 2 (7r) to the Poisson equation: £ — P£ = £. Then, the 
difference martingale method of [34] can be used to obtain the CLT. 

Remark 3.3 (Uniform ergodicity). Recall that the Markov chain (X n ) n ^ is said to be uni- 
formly ergodic if lim n ^ +oc \\P n — Tl\\oo = 0. This property implies (AS2) (but is stronger) 
and is fulfilled if and only if (X n ) n ^ is ergodic, aperiodic and satisfies the Doeblin condition 
(Do). In addition, for an aperiodic and ergodic Markov chain (A" n ) n6 N, Doeblin's condition is 
equivalent to the uniform mixing (or ^-mixing) property, and then, the ^-mixing coefficients 
go to at least exponentially fast (see [10, 81]). 

Set £ n := Y n — Y n -\. If (X n ) n ^ is uniformly ergodic and if E^op^ 2 ] < 00, then the real 
number a 2 = E^o^f] + 2ES > E 7r)0 [£i 6] is well-defined in [0,+oo). If a 2 > 0, then the 
sequence (Y n /Vn)neN converges in distribution to M(0,a 2 ) [38]. This CLT is established 
as follows: the stationarity and the uniform ergodicity of (A n ) n6 N extend to the sequence 
(£n)neN, and the (^-mixing coefficients of (£ n )neN also go to at an exponential rate (see [38, 
Rk. 4, Lem. 1]). The proof is completed using [53, Th.18.5.2]. 

The CLT for a continuous-time MAP (Xt,Yt)t>o is deduced from the discrete-time state- 
ment. 

Theorem 3.2. Under (AS1-AS2) and (AS3) with a = 2, (Y t /y/i) t> o satisfies a CLT. 

Proof. For t £ [0, +00), we set t = n + v, where n is the integer part of t and v £ [0, 1). We 
can write: 

Y = (Y L -Yn) + VnY L 

V* Vi y/i \/n' 
Recall that (Qt)t>o is the transition semigroup of (Xt,Yt)t>o- It is easily checked that the 
MAP (Xt,Yt)t>o "sampled" at discrete instants, (X n ,Y n ) n& ^, is a discrete-time MAP with 
transition kernel Q := Q\ which satisfies (AS1-AS2) and (AS3d). Therefore, (Y n /Vn)ne.N 
satisfies a CLT thanks to Theorem 3.1. Finally, the sequence ((Y t — Y n )/Vi)t>o converges in 
probability to from the Tchebychev inequality and condition (AS3d): 

IP7r,o{|^ — Y n \ > Vie} = P^ojl^l > Vie} (stationary increments) 

< ^ fi [\Y v \ 2 ] < sup, g(0il] E^o[|^| 2 ] ^ q 

te 2 te 2 

Therefore, (Yt/Vi)t>o satisfies a CLT from (3.11). □ 

Remark 3.4 (FCLT). Proposition 3.1 allows us to deduce from [8, Th. 19.2] that a functional 
central limit theorem also holds (d = 1). That is, under the assumptions of Theorem 3.1, we 
have: 

Y[nt ^ -J^{B t ) t > Q (3.12) 



as random elements of D[0, 00), the Skorokhod space of cadlag functions on and where 
(Bt)t>o is a Brownian motion with zero drift and some variance parameter. Let us give some 
comments on the FCLT relevant to our context. 
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(a) The case of a discrete-time MAP (X n ,Y n ) nG ^ with (X n ) n ^ satisfying the Doeblin con- 
dition is covered by [38]. (Y n ) n£ ^ is shown to be </>-mixing and a FCLT for ^-mixing 
sequences is used. We extend their approach to our case of L 2 -spectral gap. 

(b) Under (AS1-AS2) and the expected moment condition of order 2, Maigret [65] has es- 
tablished a FCLT for (Y n := £(X n _i, X n )) n ^* in the specific case where (X n ) ne ^ is 
Harris-recurrent. It is worth noticing that Condition (AS2) cannot be compared with 
the Harris-recurrence property. 

(c) If (X t ) t >Q is a stationary ergodic Markov process with a strongly continuous transition 
semigroup (Pt)t>o on L 2 (7r), the following convergence holds for any / G L 2 (7r) such that 
7r(/) = [7, Th. 2.1, Prop. 2.3] (see also [84] in the Harris-recurrent case): 



where (Bt)t>o is a Brownian motion with zero drift and some variance parameter. Set 
Y t := f*£(X s )ds. Since f G L 2 (vr), we have E^oD**! 2 ] < vr(|^| 2 ) for every t G (0, 1], thus 
(AS3) with a = 2 is true. Then, the convergence result above is easily deduced from 
(3.12) using the discrete-time stationary MAP (X n , Y n ) n >i introduced in the proof of 
Theorem 3.2. 

(d) Glynn and Whitt deal with the integral functional of a regenerative process in [30, 31]. 
Their results apply to a Markov process which is a specific instance of a regenerative pro- 
cess. Conditions for the CLT (FCLT) to hold are expressed in terms of a second moment 
on the increments Y^ := / Q Tl £(X s )ds of the process (Y t )t>o over a regeneration cycle of 
length T\ (and an additional condition of negligeability in probability of sup < s <r 1 \ Y S \). 
The fact that we only consider the Markov case makes our conditions easier to check. 

4 Refinements of the central limit theorem for MAPs 

Let (Xt,Yt)t£j be a MAP taking values in X x M. d . The canonical scalar product on M. d is 
denoted by (•, •). The Fourier operators associated with (Xt, Ytjt^j are introduced in the next 
subsection and are shown to satisfy a semigroup property. In the discrete-time case, precise 
expansions of the characteristic function of the additive component Yt can be deduced from 
[49] under (AS2). They are central to the derivation of our limit theorems in this section. 
Limit theorems are first considered for discrete-time MAPs. A local limit theorem, a Berry- 
Esseen bound and a first-order Edgeworth expansion are obtained. The continuous-time case 
is addressed thanks to the basic reduction to the discrete-time case used for the CLT. 

4.1 Fourier operators. A semigroup property 

For any t G T and £ G K d , we consider the linear operator St(C) acting (in a first step) on 
the space of bounded measurable functions / : X — > C as follows: 



Note that Sj(0) = Pt- In the discrete-time case, Si(C) corresponds to the Fourier operator 
which was first introduced by Nagaev [68] in the special case when Y n = Sfc=i£(^"fc) (see 




Vx G X, {S t (C)f) (x) := E (Xj0) [e^> f(X t )] . 



(4.1) 
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[42, 44] and the reference therein), and was extended to discrete-time MAPs in [4, 40] to 
prove local limit theorems and renewals theorems (see also [26]). All these works are based 
on the following formula (see Proposition 4.1 below): 

VC G R d ,Vn G N, (5 n (C)/)(x) := E (a?>0) [e*^ f(X n )] = (^(C)"/)^)- (4-2) 

This formula clearly reads as the semigroup property: S m+n (C) = Sm(Q <Sn(C)- in the 
continuous-time, it seems that the operators £t(C) were first introduced in [50] for investigat- 
ing AFs of continuous-time Markov processes on a compact metric state space X. In [50], Pt 
was assumed to have a spectral gap on the space of all continuous C-valued functions on X, 
and {St(())t>o wa s thought of as a semigroup (see (SG) below) on this space. 

Here, in view of (AS2), the above mentioned semigroup property has to be considered on 
the Lebesgue spaces h p (ir) (1 < p < oo). 

Proposition 4.1. For allt^T and ( G M. d , St(C) defines a linear contraction on TL p (ir), and 
we have: 

V(eR d , V(M)£T 2 , S t+s (() = St(()°S s (t). (SG) 
In particular, Relation (4-2) holds for all f G L p (7r). 

Proof. The first assertion is easy to prove. Next, for any £ G M. d and / G L p (7r), let us 
set: g{x,y) := f(x)e l ^' y ^ with x G X and y G M d . Then, using the Markov property and 
Lemma 3.1: 

(S t+s (C)f)(x) := E (xfi) [e^ Y ^f(X t+s )] 
= E (xfi) [E (x>0) [ e ^ y ^>/(AVs) | H X ' Y) }] = E(*,o) [(Q i5 yJ(X s ,0)] 
= E ( , i0) [e^)(Q^)(X s ,0)] =E ( , i0) [e 4 «' y ^)E Xs , [/(A t )e 4 ^ yt ) 
= E {Xi0) [e i «' y ^(S t (C)/)(X s )] = (5 s (C)(5 t (C)/)) (x) 

the third equality results from: gY a { x ,y) = f(x)e l ^'( y+Ya ^ = e l ^' Ya ^ g(x,y). This gives the 
semigroup property (SG). The last assertion is obvious. □ 



4.2 Expansions of the characteristic function of the additive component 

Here we assume that (X n , Y n ) ng ^ is a discrete-time MAP taking values in X x M. d (possibly 
derived from a continuous-time MAP) such that the driving Markov chain (X n ) n ^ is station- 
ary and satisfies (AS2). This last property ensures that Si(0) has good spectral properties, 
and the iterates Si(C) n occurring in (4.2) are studied using the Nagaev-Guivarc'h spectral 
method which consists in applying the perturbation theory to the Fourier operators S\(C) 
for small (. However using the standard perturbation theorem requires strong assumptions 
on Y\. Here we shall appeal to the weak spectral method introduced in [45] and based on 
the Keller-Liverani perturbation theorem [60]. This method is fully developed in the Markov 
framework in [49, see references therein]. In the sequel, denotes the partial derivative of 
order £ of a C-valued function F defined on an open subset of M. d . 

Conditions (AS1-AS2) are assumed to hold throughout the subsection. 
Proposition 4.2. Let mo G N*. Under condition (AS3d) for some a > mo, there exists a 
bounded open neighborhood O of C, = in M. d such that we have for all f G ~L s (tt) with any 
s > — z—: 

a— mo 

VnGN, VCGO, E^e'^ f(X n )]=\(() n L(CJ) + R n ((J), (4.3) 
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where A(-), L(-,/) ; R n (-,f) are C-valued functions of class C m ° on O, with A(0) = 1 and 
L(0, /) = 7r(/). Moreover, we have the following properties for I = 0, . . . , tuq: 

- sup|LW(C,/)| <oo (4.4a) 
Ceo 

- 3kg (0,1), sup|i#>(C,/)| = 0(^ n ). (4.4b) 

CeO 

/// := l x , we have R n (0, 1%) = 0. 

When Y n = 2^fc=i the above properties are proved in [49, Sect. 7.3] by using (4.2) 

and some operator- type derivation arguments. For a general additive component Y n , the 
method is the same 1 using Lemmas 4.1 and 4.2 below which slightly extend [49, Lem. 4.2,7.4]. 
Mention that, by using the same lemmas, Proposition 4.2 can also be deduced from [35] which 
specifies the method introduced in [36, 45] to prove Taylor expansions of A(-), L(-, /), R n (-, f) 

2 

The operator norm in the space £(L P ,L P ) of the linear bounded operators from L, p (tt) to 
LP (7r) is denoted by || • \\ p , p >. 

Lemma 4.1. If 1 < p' < p, then the map C l— >• <5i(C) * s continuous from M. d to £(L P ,L P ). 
Proof. We have for ( £ M. d , Co G ^ d and / G L p (ir), thanks to Holder's inequality 

|(Si(C)-Si(Co))/(aO| y = \^ {x ,o)[e l ^ Yl) f(X 1 )} - E (a , 0) [e 1 ^ f(X 1 )] 

j«-<o,Yi) _ i\P'\f( Xl )\ p ' 

{i,i(c-co,^)ir'i/(^i)i p ' 



< E (z,0) 



< 2 P E (3 , 0) 



mm 



the last inequality resulting from the classic inequality \e ia — 1| < 2min jl, |a|}. An integra- 
tion with respect to tt and the use of Holder's inequality give 



n(\( Sl (() - S l (Co))f\ p ') < 2*X )0 [min{l ) |( C - < ,Y 1 )\}^^]^ p E Kfi [\f(X 1 )^ p 

<2 p '||min{l,|(C-Co,>4>|}||to/( P - P ll/ll^ 

since tt is invariant. Thus, we deduce that ||5i(C)-iS , i(Co)||p ) p' < 2|| min {l, | (C~Co, Yi)\ } \\(pp')/(p- p ') 
goes to when |C — Col from Lebesgue's theorem. □ 

Lemma 4.2. Assume that (AS3d) holds for some a > itiq (mo G N*), and let 1 < j < tuq. 
If p > 1 and pj := ap/(a + jp) > 1, then ( \— > Sx(Q is j-times continuously differentiable 
from R d to C(W^), and sup CeRd [[^(OIU, < E ffj0 [|y 1 |°H'/«. 

Proof. For the sake of simplicity, we suppose that d = 1. Below we consider any £ £ R, 
Co G R and / € L p (7r). For 1 < j < mo, define (formally) the following linear operator: 

Vx G X, (^(C)/)^) := E (X|0) [(iTi)' /(X^] . 



1 See the beginning of the appendix. In particular, mention that X(Q is the dominant eigenvalue of Si(C), 
■) is related to the associated eigenprojection, and k can be chosen as re = (e _£ + l)/2 where e > is 
defined in (2.2). 

2 As observed in [35], the passage from the Taylor expansions to the differentiability properties can be 
derived from [13]. 
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First we have: 

so that, from Holder's inequality, 

n^coiiwy <E., [iiir] J> . 

Second, define A := S? _1) (C) - S? -1) (Co) - (C - Co) ^(Co)- Then, for j G {1, . . . , m - 1}, 
we have thanks to the classic inequality \e ta — 1 — ia\ < 2|a| min |l, |a|} 

|A/(x)|» < 2^'|C - Co| w E(x,o) [min{l, |(C - Co)Yi\} Pi . 
It follows from Holder's inequality that the operator norm satisfies 

||A|| PiW <2|C-Co| ||min{l,|(C-Co)n|} \Yi\ j \\ a/j . 

This proves that S± (■) is differentiable from R to £(L p ,L p j), and that its derivatives is 
\ Finally, we obtain: 

|(S{%) - Sf (Co))/(*)r < 2"E (a , 0) [min{l, |(C - CoWW' 
from which we deduce that the operator norm satisfies 

\\s[ j) (()-S[ j \( )\\ PjP . <2||min{l,|(C-Co)n|} 



Thus S'i(-) is j-times continuously differentiable from R to £(L p ,L p j). 



□ 



Next, let us return to our probabilistic context. Let V and Hess denote the gradient and 
the Hessian operators respectively. In the following proposition, the d-dimensional vector 
VA(0) and the symmetric d x d-matrix HessA(O) are related to the mean vector E^o^i] and 
the asymptotic covariance matrix associated with the sequence (Y n — nE^ o\Y\\) / yfn. 
Proposition 4.3. 

(i) If (AS3d) holds for some a> I, then VA(0) = iE^ofYi]. 

(ii) If (AS3d) holds for some a > 2, then the following limit exists in the set of the non- 
negative symmetric d x d-matrices: 

S := lim - E wfi [ {Y n - nE^olYx]) (Y n - nE^i])* ] = -HessA(O). 

Proof. Assume that d = 1 for the sake of simplicity (for d > 2, the proof is similar by using 
partial derivatives). By differentiating at £ = the equality E^ofe 11 **™] = \(() n L(C, lx) + 
-Rn(C; Ix) of Proposition 4.2, we obtain: 

iE^ i0 [Y n ] = nA«(0) + L«(0, l x ) + R^(0, Ix). 

Since E^o [Y n ] = nE 7I - i o[Yi] (from Corollary 3.1), we deduce that A^^(O) = i and lim n E 7rj o[^n]/ n 
iE^^I^i] from (4.4b). To prove (ii), assume for convenience that E^olXi] = 0. Then 
A^^O) = 0, and differentiating twice the above equality at ( = gives: — E^oK?] = 
nA^(0) + L^(0, lx) + Rn\o, lx). We obtain the desired property by using again (4.4b). □ 
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4.3 Refinements of the CLT for discrete-time MAPs 

In this subsection, (X n , Y n ) n ^ is a MAP taking values in X x M. d , with a driving Markov 
chain (X n ) n ^ satisfying (AS1-AS2). The assumptions below imply that E^oll^il] < oo, and 
for convenience we suppose that E^r^Yi] = (if not, replace Y n by Y n — nE^ofli]). 

Theorems 4.1 to 4.3 below have been established in [49] for additive components of the 
form Y n = X^/c=i£(^&)- T° the best °f our knowledge, the present extensions to general 
MAP are new. 

4.3.1 A local limit theorem 

The classical Markov nonlattice condition is needed to state the local limit theorem (LLT): 

Nonlattice condition. There is no a £ M. d , no closed subgroup H in M. d , H M. d , and no 
bounded measurable function /3 : X — )■ M. d such that: Y\ + /3(Xi) — P(Xq) £ a + H F W: q — a.s. 

This condition is equivalent to the following operator-type property. For each p £ (1, oo) and 
for all compact subset K of M. d \ {0}, there exists p £ (0, 1) such that: 

sup ||Si(C)lp = 0(p*). (4-5) 

This result is established in [49, Sect. 5] for additive functionals. The proof for general MAPs 
is similar. Since E^ofe* = vr(5i(C) n lx) by (SG), it follows that 

sup|E^o[e^< y ">]| =0(p n ). 

Theorem 4.1. The assumptions of Theorem 3.1 are supposed to be satisfied, so that (Y n / \/n)n£W* 
converges in distribution to a d-dimensional Gaussian vector with covariance matrix S. Let 
us assume that S is a definite positive matrix. Finally, suppose that the nonlattice condition 
is true. Then, we have for all compactly supported continuous function g : M. d — > R: 

lim v / detS(27rn)iE 7r0 [ff(^n)] = / g{x)dx. 

n->+oo J Rd 

Proof. Thanks to (4.3) with / := lx, Theorem 4.1 can be established as in the i.i.d. case: use 
Proposition 4.2 to control L(-, lx) and R n (Ci lx) and, as in [12], use the nonlattice condition 
and the following second-order Taylor expansion of A(-), which follows from Theorem 3.1 and 
from [46, Lem. 4.2]: 

Lemma 4.3. Assume that Conditions (AS2) and (AS3d) with a = 2 hold and that E^oC^i] = 
0. Then the function A(-) in Equality (4-3) satisfies the following second-order Taylor expan- 
sion^ ' 

A(C) = l-<C,X0/2 + o(|C| 2 ). 

□ 

Remark 4.1. We mention that a local limit theorem has been obtained in [66] for the process 
(Y n := Ylk=i -^fc)neN* associated with a stationary hidden Markov chain (X n , Z n ) n ^. In [66], 
(A n ) ng N is only assumed to be an ergodic stationary Markov chain so that the additional 
conditions for the local limit theorem to hold are more involved than those of Theorem 4.1. 

3 A direct application of Proposition 4.2 gives this expansion, but under Condition (AS3d) with a > 2. 
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4.3.2 Rate of convergence in the one-dimensional CLT 

Here we suppose that d = 1. Under the condition E^oflYil 2 " 1 " 5 ] < oo, the asymptotic variance 
a 2 of Proposition 4.3 is defined by a 2 := lim n E^op^/n. 

Theorem 4.2. Under Conditions (AS1-AS2) and (AS3d) for some a > 3 and if a 2 > 0, 
then there exists some constant B > such that 



where <!>(•) is the distribution function of the Gaussian distribution M (0,1). 

Proof. Here, the functions A(-), L(-) := L(-, 1%) and R n (-) := R n (-, lx) in Proposition 4.2 are 
three times continuously differentiable on O and satisfy the following properties: 

su PiteO \L(u) — M/\u\ < oo (from (4.4a) and L(0) = 1) 
sup u6C , \R n (u)/u\ = 0(K n ) (from (4.4b) and i? n (0) = 0) 

A(u) = 1 - a 2 u 2 /2 + 0(u 3 ) for u small enough (since A (1) (0) = and A( 2 )(0) = -a 2 
from Proposition 4.3). 

Then, we can borrow the proof of the Berry-Esseen theorem of the i.i.d. case (see [23]). □ 

Remark 4.2. The details of the previous proof are reported in [48, Th.2] for the additive 
functional Y n = ^fc=i £(^fc> Xk—l) °f a U-geometrically Markov chain. They are the same in 
our context. In fact, by writing out the arguments of [48, Th. 2], we can derive the following 
more precise property: the constant B in (4.6) depends on the sequence (Y n ) n ^, but only 
through a 2 and E 7I - j o[|^ / i| 3+e ]- Of course, this control is not as precise as in the i.i.d. case [23], 
but it is enough to obtain interesting statistical properties as in [24, 48] or in Section 5. 
Remark 4.3. Let us consider the specific case Y n — Y n -\ = £,{X n ) for some real-valued measur- 
able function £. Under Conditions (AS1-AS2), if the real number a 2 defined in Remark 3.2 
is positive, then we have (4.6) under the expected moment condition vr(|^| 3 ) < oo. This 
follows from [47, Cor. 3.1] which is based on the spectral method and martingale difference 
arguments (see also [49, Sect. 6]). Note that the moment condition vr(|^| 3 ) < oo is optimal 
according to the i.i.d. case [23]. 

Remark 4.4. Let (X n ) n ^ be a p-mixing Markov chain. The additive functionals of (X n ) ng ^ 
involved in the M-estimation of Markov models (see (5.5)) are of the form Y n = Ylk=i 1> -^fc)- 

Since (X n ,Y n ) n ^ is a MAP, Theorem 4.2 applies provided that £:XxX— >Misa measurable 
function such that E^o^pfo, Xi)] = and E n $[ \£(X , Ai)| 3+e ] < oo for some e > 0. This 
will be supported by the statistical result of Section 5. 

Finally let us state a first-order Edgeworth expansion. 
Theorem 4.3. Assume that Conditions (AS1-AS2) and (AS3d) hold for some a > 3, that 
a 2 is positive and the nonlattice condition is true. Then, there exists [13 € R such that: 



Vn > 1 




(4.6) 




(4.7) 



where 7](-) is the density of the Gaussian distribution N (0,1). 
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Other limit theorems can be stated under Condition (AS2) as, for instance, a multidi- 
mensional Berry-Esseen theorem in the Prohorov metric (see [49, Sect. 9]), and the multidi- 
mensional renewal theorems (see [39]). Although Proposition 4.2 extends to the case when 
the order of regularity mo is not integer, it does not allow to deal with the convergence of 
Y n (properly normalized) to stable laws, since we assume a > tjiq (in place of the expected 
condition a = mo). For an additive functional Y n = X/fc=i£(^ifc)> a careful examination 
of the proof of Lemmas 4.1 and 4.2 shows that this limitation could be overcame under a 
condition of the type : £ G L^tt) =>■ P£ G L^ 3 (tt) with /3' > /3. Anyway mention that, under 
Condition (AS2) and the previous condition on £, convergence to stable laws is obtained in 
[54, Section 2.3] by using a "martingale approximation" approach. A natural question is to 
ask wether the last condition on £ is necessary. 

4.4 The non-stationary case 

Under (AS2), we discuss the extension of the previous results to the non-stationary case. Let 
\i be the initial distribution of (X n ) n ^. The careful use of [49, Prop. 7.3] allows us to extend 
Proposition 4.2 as follows. Under condition (AS3d) 4 with a > tuq, and under the following 
assumption on fj, 

(NS) fj, is a bounded linear form on L r (-7r) with r such that 1 < r < as/(a + rriQs), 

where s > a /{a — mo), all the conclusions of Proposition 4.2 remain true when tt is replaced 
by /i, namely: for some bounded open neighborhood O of £ = in M. d , we have for / G L s (7r) 

Vn G N, VC G O, E M)0 [e l ^ f(X n )] = A(C) n L(C, /, M) + #n(C, /, M), (4-8) 

with C-valued functions A(-), L(-, /, /ii), i? n (-, /, /i) satisfying the same properties as in Propo- 
sition 4.2. It is worth noticing that A(-) is the same function as in (4.3), contrary to L(-, f, [£) 
and R n (-,f,ii) which both depend on fj,. 

Condition (NS) means that /z is absolutely continuous with respect to tt with density 
(j) G L r (tt) where r' = r/{r — 1) is the conjugate number of r. It is easily checked that 
r' > as j {[a — mo)s — a) > 1. Note that the bigger is the exponent a in Condition (AS3d), 
the closer to 1 is the allowed value of r' . 

Proposition 4.3 extends to the non-stationary case as follows. 

(i) If (AS3d) and (NS) hold with m = 1, then VA(0) = i lim n E^ [Y n ]/n. 

(ii) If (AS3d) and (NS) hold with mo = 2, then the conclusions of Proposition 4.3(h) remain 
true with fi in place of tt. 

Using the decomposition (4.8) (with / := lx), we obtain as in the stationary case the following 
statements. 

1. Under (AS2), (AS3d) with a = 2 and /i satisfying condition (NS) with mo = 1: the 
CLT, and the LLT under the additional non-lattice condition. 

4 In this non-stationary case, we only require condition (AS3d) with the stationary distribution it and the 
mean vector remains E^ofYi]. 
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2. Under (AS2), (AS3d) with a > 3 and fj, satisfying condition (NS) with mj = 3: the 
Berry-Esseen bound, and under the non-lattice condition, the first order Edgeworth 
expansion (4.7) with the additional term — 6„r/(u)/(cr^/n), where is the asymptotic 
bias: 6 M = lim n E^o [Y n ] (see [49] for details). 

For instance, let us sketch the proof of the CLT. Equality (4.8) with / := lx gives 

E Mi0 [e^ y "/vM] = A(C/v^) n ^(C/v^,lx,M) + ^(C/v^,lx^)- 

Since mo = 1 we have lim^ L(£/y/ri, lx, m) = 1 an d lim n R n {Q/\/n, lx, /x) = 0. Finally, the 
second-order Taylor expansion of Lemma 4.3 shows that lim n \(^/y/n) = exp(— (£, ££)/2). 

In general, the previous statements 1. and 2. do not apply to the case when the initial 
distribution \i is a Dirac mass (which is not defined on L r (-7r)). However, when the state 
space X of the driving Markov chain is discrete, these statements are valid with any initial 
distribution S x provided that tt(x) > (because 5 X is then a continuous linear form on each 
IP (it) = t p {ir)). 



4.5 The continuous-time case 

In this section, we consider the case where T = (0, +oo). The process (Xt)t>o is assumed to 
satisfy Conditions (AS1-AS2). Let us mention that the moment condition (AS3) reduces to 

Vug (0,1], E^ [|nr]<oo 

when the semigroup (Qt)t>o is strongly continuous on L 2 ((-7r, 0)) (so is (Pt)t>o on L 2 (7r)). 

All the theorems of the previous subsection are extended to (Yt)t>o- Recall that Theo- 
rems 4.1 to 4.3 concern the multidimensional local limit theorem, the one-dimensional Berry- 
Esseen theorem, the one-dimensional first-order Edgeworth expansion respectively. For the 
sake of simplicity, we still assume that E^ofYi] = 0. 

Theorem 4.4. The conclusions of Theorems 4-1 to 4-3 are valid for (Yt/y/i)t>o under the 
same assumptions, up to the following change: the moment condition (ASSd) is reinforced 
(with the same condition on a) in (ASS): 

sup E^op^X] < °°. 
«e(o,i] 

Note that the extensions to the non-stationary case presented in Subsection 4.4 can be 
adapted to the continuous-time case. 

When Y t is defined by Y t := J* £(X S ) ds, any moment condition of the type sup 1 , 6 [ 0j i] E^q [\Y v 
co (a > 1) is fulfilled if we have 7r(|£| Q ) < °°- Indeed: 



V«€[0,1], E. [|y v | a ] <E n : 



f'laXsTds] = f E 7rfi [\S(X a )\«]ds = Tr{\Z\°). 
Jo J J 



Note that the nonlattice condition used in Theorem 4.4 is the same as in the discrete-time 
case (see Subsection 4.3.1) and plays the same role. Indeed, writing t = n + v where n is the 
integer part of t, we know that E^ofe* ^' Yt) \ = vr(S'i(C) ri (5'„(C)lx)) • Using (4.5) and the fact 
that S v is a contraction on L p (-7r) (p G (l,+oo)), it follows that 

Bnp\E ViQ [e i ^]\=0(p n ). (4.9) 
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We prove Proposition 4.4 below which is the continuous-time version of Proposition 4.2. 
Then, combining Proposition 4.4 with relation (4.9), the Fourier techniques of the i.i.d. case 
can be used to extend Theorems 4.1-4.3 to {Y t /y/j) t >o 

Proposition 4.4. Let mo E N*. Write time t as t = n + v where n is the integer part of 
t. Under condition (A S3) for some a > mo, there exists a bounded open neighborhood O of 
C = in M. d such that we have for all f € 1/(71") with any s > a /(a — mo): 

VtG(0,+oo), VCGO, E^ [e i{C ' Yt) f(X t )] = A(C) n L(C, S v (()f) + R n {(,S v (()f), 

where A(-), L(-,-) and R n (-,-) are the functions of Proposition 4-2. Moreover, the C-valued 
functions L v j (() := L(£, S v (C)f) and R n ,vj(C) := -^n(C) Sv(C)f) are of class C m ° onO, and 
we have the following properties for £ = 0, . . . , mo: 

sup \L®JQ\ < oo 
CeO,«e[o,i] 

3k € (0,1), sup \B® vf (Q\ = 0( K n ). 
CeO,«e[o,i] 

Note that we have A(0) = 1, L v j(0) = ir(f), and i? n> ^i x (0) = 0. 

Proof. From (4.1) and (SG), we obtain for any ( £ R rf , / G (1 < p < oo): 

E nj o[e i{C > Yt) f(X t )] =ir(S n+v (t)f) = 7r(5i(C) n (^(C)/)) =E^ [e J ^ {S v (()f)(X n )], 

(4.10) 

and the desired expansion then follows from Proposition 4.2. The two following (straight- 
forward) extensions of Lemmas 4.1-4.2 are needed to establish the others assertions. Let 

t e (o,+oo). 

Lemma 4.4. If 1 < p' < p, then the map £ i— > St(C) is continuous from M d to £(L P ,L P '). 

Lemma 4.5. Assume that E, nt o[\Yt\ a ] < oo for some a > mo (mo E N*J, and let 1 < j < mo. 
If p > 1 and pj := ap/(a + jp) > 1, then ( h-> St(() is j-times continuously differentiable 

from R d to C(LP,1P*), and sup CeRd \\S? (C)\\ P , Pj < E.,o[|^| Q ] j/a . 

The regularity properties (in of the functions L(£, S v (()f) and R n ((, S v (C)f) are not a 
direct consequence of those stated in Proposition 4.2 because of the additional term S v (C)f- 
To that effect we need a careful use of the operator-type derivation procedure. This part is 
postponed in Appendix A on the basis of [49]. □ 

5 A Berry-Esseen theorem for the M-estimators of /^-mixing 
Markov chains 

The M-estimators are a general class of estimators in parametric statistics. This covers the 
special cases of maximum likelihood estimators, the least square estimators and the minimum 
contrast estimators. In the i.i.d. case, a modern treatment on M-estimation is reported in 
[85, Chap. 5], and a Berry-Esseen bound for M-estimators is obtained in [74]. In a statistical 
framework, such a bound has to be uniform in the parameters. Pfanzagl's method, which is 
applied to Markov data in [48], requires a preliminary result on the rate of convergence in 
the CLT for additive functionals, with a precise control of the constants with respect to the 
functional (cf Remark 5.1). Earlier extensions of [74] to the Markov context are discussed in 
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[48]. For p- mixing Markov chains, the closest work to ours is [76]. Our main improvement 
is on the moment conditions which are now close to those of the i.i.d. case. A detailed 
comparison is presented at the end of the section. 

Let be any nonempty parameter set. For a Markov chain (X n ) n& ^ with state space X 
and transition kernel Pg which depends on 9 £ 0, we introduce the uniform L 2 (-/r)-spectral 
gap (i.e. the uniform p-mixing) property. 

(M.) The Markov chain (X n ) ng ^ has a uniform L 2 (it) -spectral gap with respect to the pa- 
rameter set G if 

1. for all 9 £ ; (X n ) ng N has a unique Pg-invariant distribution *kq\ 

2. for all 9 £ ; (X n ) n ^ is stationary (i.e. Xq ~ -Kg); 

3. its transition kernel satisfies Condition (AS2) in a uniform way with respect to 9, 
namely there exist C > and k £ (0, 1) such that 

V0G9, Vn>l, \\Pg -Ug\\ 2 < Cn n , 

where U e (f) := n e (f) l x for f £ L 2 (tt) 

In order to derive a Berry-Esseen bound for the M-estimators of (X n ) ng pj satisfying (M), 
we need a uniform Berry-Esseen bound for some specific additive functionals of the Markov 
chain (X n ) n£ fq. In the next subsection, we propose such a uniform Berry-Esseen bound for 
the second component of a general parametric MAP. This result will be applied to the MAPs 
associated with these specific additive functionals (see Remark 5.1). 



5.1 A uniform Berry-Essen bound for the second component of a para- 
metric MAP 

Here we propose a refinement of Proposition 4.3 and Theorem 4.2. Let us introduce the 
following condition. 

(A) For every 9 £ 0, (X n ,Y n ) n ^ is a X x M-valued MAP, Y\ is Fg-integrable and centered 
(i.e. E e [Y 1 ] = 0). 

Below, the driving Markov chain (X n ) ng N is assumed to satisfy condition (M). Thus, the 
notation Fg stands for the underlying probability measure, which depends on 9 through the 
transition kernel Qg of (Y n , X n ) nG ^ and the initial (stationary) distribution (710,0). Eg[-] 
denotes the associated expectation. 

Theorem 5.1. Assume that Condition (A) is true for the MAP (X n ,Y n ) n ^ and that the 
driving Markov chain (X n ) n6 N satisfies Condition (M.). If Mi := sup 06 @ Kg [|Yi| 2+e ] < 00 
with some e > 0, then o~ 2 (9) := lim n Eg[Y 2 ]/n is well-defined and is finite for each 9 £ 0, the 
function o~ 2 (-) is bounded on 0, and there exists a positive constant Cy such that 



Vn > 1, sup 
6»ee 



a\9) 



21 



?? 



< ^- (5.1) 
n 



The constant Cy depends on the sequence (Y n ) n ^, but only through the constant Mi. 
If the two following additional conditions hold true 

3e>0, M 3 := supE4l^i| 3+e ] < 00 (5.2) 
e»ee 

cr := inf a{9) > 0, (5.3) 
6»ee 
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then there exists a positive constant By such that 



ye g e, Vn > 1, sup 



aelf 



< a - $(a) 



The constant By depends on the sequence (Y n ) n£ fq but only through o"o and the constant M3. 

Recall that the proofs of Proposition 4.3 and Theorem 4.2 are based on Proposition 4.2. 
Here, for any fixed 9 £ Q, Proposition 4.2 applies and gives an expansion of Kg[e l< * Yn f(X n )], 
but for a neighbourhood Oq of ( = 0, some C-valued functions Xg(-), Lg(-,f), Rg. n (-,f), 
and some Kg £ (0,1), which all may depend on 9. Consequently, in order to prove The- 
orem 5.1, we must establish that, under Conditions (A4), (A) and the moment condition 
supg e g, Kg [|Yi|' m ° +£ ] < 00, all the conclusions of Proposition 4.2 are fulfilled in a uniform 
way with respect to 9 £ G. This job has been done in [48, Sect. III. 2] in the context of 
V- geometrically ergodic Markov chains. The arguments in the present setting are the same 
up to the following changes: replace the uniform ^-geometrical ergodicity assumption of [48] 
by Assumption (A4), and replace the domination condition (D mo ) of [48] by the moment 
condition sup 9g0 Eg [| Yi| mo+e ] < 00. The previous assumptions allow us to extend Lemmas 
4.1-4.2, and so Proposition 4.2, in a uniform way in 9 € 0. 

Remark 5.1. In the next subsection, Theorem 5.1 will be applied as follows. Given a 
Markov chain (X n ) n£ N satisfying Condition (M) with respect to G, we consider the MAP 
(X n , y n (p)) ng N where Y n (p) depends on some parameter p £ V and is of the form 

n 

Y n (p) := J>(p,*fc-i,*k)- 
k=l 

The property of the constant Cy in Theorem 5.1 ensures that Inequality (5.1) is uniform in 
p and 9 when M2 := sup g -p egQ Eg [| Yi(p)| +s ] < 00 (of course, the asymptotic variance in 
(5.1) is replaced by some a 2 (9,p)). In the same way, the Berry-Esseen bound (5.4) is uniform 
in p and 9 when M 3 := sup pg -p j6)ge Eg [|Yi(p)| 3+e ] < 00 and m{ p£ p 0£@ a(9,p) > 0. 

Note that these comments extend to a general MAP (X n ,Y n ) n£ fq which may depend on 
some parameter 7 via its probability distribution and its functional form, provided that the 
bounds M2, M3, o"o in Theorem 5.1 are uniform in 7. 

Remark 5.2. The conclusions of Theorem 5.1 are also valid when Xq ~ \xq with fig of the 
form fig = (ftgdirg, provided that sup 0g @ ||0e||r' < °°> with r' defined as in Subsection 4.4 
(case mo = 3). 



5.2 A Berry-Esseen bound for the A/-estimators of p-mixing Markov chains 

Throughout this subsection, Q is some general parameter space and (X n ) n >o is a Markov 
chain with state space X satisfying the uniform L 2 (7r)-spectral gap condition (A4). The 
underlying probability measure and the associated expectation are denoted by Pg and E#[-]. 
Recall that (X n ) n& ^ is assumed to be stationary under (A4). Let us introduce the additive 
functional of (A" n ) n >o 

n 

M n (a) = -Y / F(a,X k _ 1 ,X k ) (5.5) 

k=l 

where a = a(9) S A is the parameter of interest, F(-, •, •) is a real-valued measurable function 
on A x X 2 and A is an open interval on the real line. Function F is assumed to satisfy the 
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following moment condition 

sup{E g [\F(a,X ,X 1 )\], 9 G 0, a £ A] < oo. (5.6) 

Set Mq{o) := Efl[.F(a, Ao, Ai)]. We assume that, for each # G O, there exists a unique 
ao = C(q(9) G .4, the so-called true value of the parameter of interest, such that we have 
Mq{o) > Mg(ao), Va ^ ao- To estimate ao, we consider the M-estimator a n defined by 

M n (a n ) = minM n (a). 

Also assume that, for all (x,y) G X 2 , the map a i— > F(a,x,y) is twice continuously differen- 
tiable on A. Let F^ 1 ' and F^ 2 ' be the first and second order partial derivatives of F with 
respect to a. Then 

_. n i n 

Mi 1] (a) = -Y j F^(a,X k _ l ,X k ), M<£\a) = - V F^(a, AVi, X k ). (5.7) 
k=l k=l 

We shall appeal to the following assumptions. 
(VO) There exists some real constant e > such that 

'\FW(a,Xo,X 1 )\ 3+e + \F^(a,Xo,X 1 )\ 3+e ~ 



sup Eg 

see, a&A 



< oo. 



(VI) V# G ©, Kq[F^ (ao, Xo, X%)] = and ao = ao($) is the unique parameter value for 
which this property is true; 

(V2) m(9) := E e [F( 2 ) (a , X , Xi)] satisfies inf m{9) > 0; 

(V3) Vn> 1, M^(a n ) = 0. 

Notice that (VO) gives sup ege m{9) < oo. Set Y n (1) (a) := nil4 1} (a) and A n (2) (a) := 
nM^\a). Then, thanks to Theorem 5.1 applied to MAPs (X n , Y n (1) (a))neN and (A n , y„ (2) (a))r 
the conditions (V0)-(V2) enable us to define the asymptotic variances: 

a 2 (9) :=lim-E4yi 1 )(a ) 2 ] a 2 2 {9) :=lim-E e \{Y^(a ) - nm{9)f 

and we know that sup eg Q o~j(9) < oo for j = 1,2. The following additional conditions are 
also required: 

(V4) wfoee aj(0) > for j = 1,2. 

(V5) There exist n > 2 and a measurable function W > such that sup ege EgfW 7 ] < oo 
and 

V(a,a)£A 2 , V(x,y)£E 2 , \F^ 2 \a,x,y) - F (2) '(a ' ,x ,y)\ < \a — a'\ (W(x) + W(y)) . 
(V6) There exists a sequence j n — > such that 

supPflj | "r, - a \ > d} < 7 n , 
6»ee 

mtt d := m{ e£e m{9)/(4(E e [W(Xo)] + 1)). 
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Theorem 5.2. Assume that the Markov chain (X n ) n ^ satisfies Condition (M), that F 
satisfies Condition (5.6), that the M — estimator a n is defined as above, and finally that Con- 
ditions (V0-V6) are fulfilled. Set t{9) := o~\(9)/m{9). Then there exists a positive constant 
C such that 



Vn > 1, sup sup 



r{9) 



(a n - a ) < u \ - 



'n 



Thanks to Theorem 5.1, the proof of Theorem 5.2 borrows the adaptation of Pfanzgal's 
method given in [48]. One of the main difficulties in this method is to obtain a Berry-Esseen 
bound for the additive functionals Y n (p) := Ylk=i ff(P> ^fc-lj X k ) with p := (v,q,ao) and 

g(p, X k ^,X k ) := F« ( ao , X k . u X k ) + ^-^§-{F^ (a , X k _ x ,X k ) - m{9)) 



for \v\ < 2\/\n q. Observe that we have from (V0-V2): 



sup E e [\g(p,X ,Xi)\ £ ] < oo. 

{(v,q):\v\<2x/hi^},6e0 

Then, Remark 5.1 gives the desired Berry-Esseen bound for (Y n (p))neK i n a uniform way over 
the parameter (9,p). 

When the X n 's are i.i.d., Theorem 5.2 corresponds to Pfanzagl's theorem [74] up to 
the following changes: in [74], irg is the common law of the X n 's; the additive functional 
is M n (a) = (l/n)^2 k=1 F(a,X k ); we simply have aj(9) = E g [F^ (9 , X ) 2 ] and a$(6) = 
~Eg[(F( 2 \9,Xo) — m(9)) 2 ] , and finally Assumption (VO) is replaced by the weaker (and opti- 
mal) moment condition: sup 06e Eg [ \F^(9,X )\ 3 + \F^(9, X )\ 3 ] < oo. 

Earlier extensions of [74] to the Markov context are discussed in [48]. Let us compare our 
result with that of [76] , in which the family of transition probabilities Pg is assumed to satisfy 
a uniform Doeblin condition with respect to 9 S O. This condition corresponds to a uniform 
L°°-spectral gap condition with respect to O which is stronger than our Condition (M) (see 
Subsection 2.1). Let us mention that the moment condition on F^ and F^ in [76] is the 
following (a(9) = 9 in [76]): 



SUp Kg 

x-ex,6»ee 



F^(9,X ,X 1 )f + \F( 2 \9,X ,X 1 )\- i I X 



x 



< oo. 



Because of the supremum over x € X, this condition is in general much stronger than our 
moment condition (VO) (despite the order 3 + e in (VO) instead of 3). To see that, neglect the 
role of 9 and consider a functional / on X. Then the difference between the condition used in 
[76] and (VO) is comparable to that between sup xeX E[|/(Xi)| 3 \X = x] and E 7r [|/(Xi)| 3+e ] 
(or, equivalently, between the supremum norm ||-P(|/| 3 )||oo and the norm ||/||3+ e of / in 
L 3+e (-7r)). Consequently, Theorem 5.2 applies to the models considered in [76] but requires 
weaker moment conditions. 

Remark 5.3. The conclusion of Theorem 5.2 holds true when Xq ~ fig and fig satisfies the 
condition given in Remark 5.2. In this case, if F is such that 

sup E e [|F(a,Vo,V 1 )| 1+e ] < oo 

for some e > 0, then Mg(a) = Kg[F(a, Xq, Xi)] can also be defined by (see Subsection 4.4): 

Mg(a) = lim Eg^ g [M n (a)]. 
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6 Conclusion 



In this paper, we propose limit theorems for the second component (Yt)t£j of a discrete 
or continuous-time Markov Additive Process (MAP) (X t ,Y t ) te j when (X t ) te j has a L 2 (7r)- 
spectral gap. The derivation of the CLT is based on a p- mixing condition strongly connected 
to the L 2 (7r)-spectral gap property. The results related to the convergence rate in the CLT are 
developed from the weak spectral method of [49]. Note that here the discrete and continuous- 
time cases are covered in a unified way. In this context, the semigroup property (SG) for 
the family of operators (S t (C)) teT defined by {S t (()f)(x) := E (Xj0) [e l «- y *> f(X t )] (C € R d , 
x G X) has a central role. We mention that this semigroup property is essentially true only 
for MAPs. The impact of the results is expected to be high for models involving a L 2 (7r)- 
spectral gap, since the limit theorems are valid for general (discrete and continuous time) 
MAPs, and under optimal (or almost optimal) moment conditions. This is illustrated in 
Section 5 where a Berry-Esseen bound for the M-estimator associated with p-mixing Markov 
chains, is derived under the (almost) expected moment condition. 

A Additional material for the proof of Proposition 4.4. 

Here, we study the regularity properties of the functions ( 4 L((, S v (C)f) and £ i— > i? n (C> S v (()f) 
involved in the decomposition of Proposition 4.4. 

1) Let us recall that we have (see (4.2)) 

VC G R d ,Vn G N, E^e*^ f(X n )] = 7r(5 1 (C) n /). 
and, for t in some open neighbourhood O of £ = 0, (see [49, 7.2]) 

5i(cr = A(crn(o + iv(cr, 

where \{Q is the dominant eigenvalue of S\(0, 11(C) is t ne associated rank-one eigenprojec- 
tion and N(() is a bounded linear operator on each L p (7r) 1 < p < oo. Both equalities imply 
that 

lM e<(C,yn> /(*»)] =A(C) n vr(n(C)/)+vr(iV(Cr/). (A.l) 

Furthermore the eigenprojection II(£) and the operators N(C) n are defined as in the standard 
perturbation theory by 

n(o = ^-£( z - ^(or 1 dz, N(cr = ^ jf z n (z - 51(c))- 1 

where these line integrals are considered respectively on some oriented circle Ti centered at 
2 = 1, and on some oriented circle Tq centered at z = 0, with radius k < 1 where k is (for 
instance) (1 + exp(— e))/2 with e defined in (2.2). 

2) Let us return to the continuous-time case. We obtain from (4.10) and (A.l) 
Vt G O, E^e^) f(X t )] = A(C) n 7r(n(C)(5 u (C)/)) + 7r(iV(C) n (5„(C)/)). 
Thus, we can write with the notations introduced in Proposition 4.4 

VtGO, L((,S v (C)f) :=7r(n(C)(5 w (C)/)), R n (C,S v (()f) := 7r(iV(0 n (^(C)/)) • 
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Therefore, we only need to study the regularity of the map ( \- > [z — S'i(C)) -1 o S v (() on 
O for controlling that of the map £ i— > E^o [e*^ ,y ^ /(Aj)] on O (and, as a result, proving 
Proposition 4.4). 

3) Recall that || • denotes the operator norm in the space £(LP,hP ) of the linear 
bounded operators from L p (7r) to O 5 (it). The notation W(-) G C 3 (8,6') means that there 
exists a bounded open neighborhood V of ( = in R d such that: 

VC G V, W(C) G £( L ^ l/) andW C(L e , L e ') has a continuous j-order differential 

on V. 

Let us introduce the maps U : £ h- >• (z — Si(£)) _1 and V : £ •->■ Sv(0- We are g°i n g to apply 
the next obvious regularity property. Let 1 < 02m o +2 < 02m o +l < • • • < Oi < Go < oo (note 
that L 6 * C L 01 C • • • C L 92m o+i c L 92m o+2), and assume that we have: 

U G C°(02m o +1) #2m +2) H C 1 (#2m -l j #2m +2) H ■ ■ ■ fl C™ ^ 1 (63, 02m o +2) H C™° (#1 , #2m +2) 

v g c°(0 o , 0i) n c 1 ^, 3 ) n ■ ■ ■ n c™ ^ 1 ^, 2mo _i) n c mo (^, e 2mo+1 ). 

Then f/^r»(8 ,9 2mo+2 ). 

4) Let us introduce the following (non-increasing) maps from [1, +00) to R: 

T o (0) := and Ti(6») := 



a + £o# a + 

where £0 will be defined in (A. 4). Let > 1. Lemma 4.4 and the continuous inclusions 
between the Lebesgue spaces show that 

T o (0) > 1 =>• V0' G [1,T (6)], S v (-) G C°(0,0')- (A.2) 

On the same way, Lemma 4.5 gives for j = 1, . . . , mo: 

> 1 =^ V0' G [l,Tf (0)], S v (-) G ^'(0,0'), (A.3) 

and the derivatives in the last property are uniformly bounded in v G [0, 1] on any bounded 
open neighborhood of £ = 0. 

Now set #0 := s j $1 := Tq(s), and observe that the assumption on s (i.e. s > a/ (a — mo)) 
is equivalent to T™°(0o) = a0o/(a + mo#o) > L so that there exists £0 > such that 

(nnnie,) = (Tonniueo)) = a6 ° = 1. (a.4) 

Define 

2 := r o (0i), 3 := TiTo(ei), 4 := r o TiT o (0i), . . .,6 2mo+ 2 := (T o T 1 ) m °T o (0 1 ), 

namely: 9 2j := (ToTi)^" 1 ^^!) for j = l,...,m + l, and 2j+ i := ^(ToTi)^ 1 ^!) 
for j = l,...,mo. Note that 02m o +2 = 1- From (A.2)-(A.3), V(-) := S^-) satisfies the 
regularity properties stated in part 3), and the corresponding derivatives (on any bounded 
open neighborhood of Q = 0) are uniformly bounded in v G [0, 1]. 

Next, setting / := {6\, 62, ■ ■ ■ , 02m o +2}; it follows from (A.2)-(A.3) (with v = 1) that 
condition C(mo) of [49, 7.1] holds, so that the conclusions reported in [49, p. 48] are true: 
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{H ) if 6 G / andT {9) G /, then ( ^ (z-Si(C)r 1 G C°(0,T o (0)) uniformly in z G IoUTi. 
and for £ = 1, ... , ttiq: 



(fli) j/0€ f| [r o - 1 (ror 1 )- fc a)n(r 1 ro)- fe (/)],^ e nC^(z-5 1 (c))- 1 G^(0,(r o r 1 )^r o (0)) 



fc=0 

uniformly in z G To U Ti . 

Therefore [/(•) := (z — S'i(-)) _1 satisfies the regularity properties stated in part 3). 

5) Finally, we deduce from the property of part 3) that there exists a neighbourhood V 
of £ = in R rf such that the map C ^ (^-^(O)" 1 o S V (C) is mo-times continuously 
differentiable from V to C(L, s (tt), L 1 (-7r)) uniformly in z G Tq U r\ and furthermore we have 
for £ = 0, ... , ?no: 
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